Sara Nayer

Distributed PCA

Problem Formulation


Principal Component Analysis (PCA) is an unsupervised, non-parametric statistical technique primarily used for dimensionality reduction in machine learning. In conventional PCA, the goal is to recover a low rank matrix from measurements $\boldsymbol{y}_{ik} = \langle \boldsymbol{X}^*, \boldsymbol{A}_{ik}\rangle$, $i =1,\ldots, m$, and $k = 1,\ldots, q$. Here $\boldsymbol{X}^*$ is the ground truth matrix and $\boldsymbol{X}^* = \boldsymbol{U}^*\boldsymbol{B}^*$, where $\boldsymbol{U}^*$ and $\boldsymbol{B}^*$ are low dimensional matrices of size $n\times r$ and $r \times q$ respectively. Also, matrices $\boldsymbol{A}_{ik}$'s of size $n \times q$ are the design matrices.

A main issue with the conventional PCA is that recovering matrix $\boldsymbol{X}^*$ is a centralized algorithm. Thus, it faces memory and computation complexity problems. To overcome these issues, we present a distributed version of the problem where \[ \boldsymbol{y}_{ik} = \langle \boldsymbol{x}^*_k, \boldsymbol{a}_{ik} \rangle. \] The design vectors, $\boldsymbol{a}_{i,k}$'s, are mutually independent. Still the goal is to recover $\boldsymbol{U}^*$ and $\boldsymbol{B}^*$, and hence $\boldsymbol{X}^*$. In this system there is a master server and $q$ computing nodes.

The idea is that once we have a good estimate of $\boldsymbol{U}^*$, say $\hat{\boldsymbol{U}}$, then $k$-th node can use it to get an estimate of $\hat{\boldsymbol{b}}_k$. Next, each node will send $\hat{\boldsymbol{b}}_k$ to the master server and the server will update $\hat{\boldsymbol{U}}$. The process iterates untill a stop criteria hits. More preciesly, at each iteration the server sends $\boldsymbol{y}_{ik}$ and $\hat{\boldsymbol{U}}'\boldsymbol{a}_{ik}$, for $i=1,\ldots,m$, to the $k$-th node and the nodes estimate $\hat{\boldsymbol{b}}_k$. Next, the nodes send $\hat{\boldsymbol{b}}_k$'s to the server and the server use it to update $\hat{\boldsymbol{U}}$. Thus, we introduced a decentralized algorithm for PCA that is efficient both in memory and computation point of views. We present a Gradient Based algorithm and showed that if \[ mq \geq C(n+q)r^2, \] then with high probability the algorithm converges. Note that since $mq \geq (n+q)r$ thus our algorithm needs only $r$ times more measurements than the minimum required. Also, each node might enter or leave the party at anytime.

For more information and algorithm please refer to the following paper and please cite the paper when you use its MATLAB code.

S.Nayer and N.Vaswani, "Fast and Sample-Efficient Federated Low Rank Matrix Recovery from Column-wise Linear and Quadratic Projections", .